60        Bioinformatics

in the sorted suffixes that corresponds to the BWT and F is the first column in the sorted

rotations. The L and F columns are separated as shown in Figure 2.9.

We need only the first (F) and last (L) columns to reverse the BWT to the original sorted

suffixes of the string. The BWT can be recovered using the LF mapping.

Since F must begin with “$”, we know that this character is the last one in the original

string. The “$” is preceded by the first character in L, which is “A”; so now we could infer

the last two characters “A$” in the original sequence. To infer the character that comes

before “A”, let us find the first “A” in F, which is in the second row, so the character in the

second row of L which is “G” is the character that comes before “A$”; so now we could

infer the last three characters “GA$” of the original string. Since this “G” is the first “G”

in L, it points to the first “G” in F (5th row); thus, the character that comes before “GA$”

is the character in the 5th row of L, which is “G”. So now the inferred last four characters

are “GGA$”. Since this “G” is the third “G” in L, it points to the third “G” in F, which is in

the 7th row, and the character in the 7th row of L, which is “T”, is the character that comes

before “GGA$” so the inferred last five characters are “TGGA$”. The rank of this “T” is

the first “T” in L which points to the first “T” in F in the 9th row, so the character which

comes before “TGGA$” is “C” in the 9th row of L. Thus, the inferred last 6 characters of

the original string are “CTGGA$”. We can continue using this L-to-F mapping until we

recover the entire sequence “CTTGGCTGGA$” as indicated by the arrows in Figure 2.10.

This reversion process is called backward search mechanism since it begins from the very

end of the string and moves backward. A computer algorithm usually performs this task.

We can recover the original string using the same LF mapping property but in a dif-

ferent way. Given the LF matrix, the string S is reconstructed starting from the end of the

FIGURE 2.9  L and F columns.

FIGURE 2.8  Constructing BWT from Suffix array.